{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Structured Input for LLMs\n",
    "\n",
    "It has been observed that most LLMs perfom better when prompted with XML-like content (you can see it in [Anthropic's prompting guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags), for instance).\n",
    "\n",
    "We could refer to this kind of prompting as _structured input_, and LlamaIndex offers you the possibility of chatting with LLMs exactly through this technique - let's go through an example in this notebook!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Install Needed Dependencies\n",
    "\n",
    "> _Make sure to have `llama-index>=0.12.34` installed if you wish to follow this tutorial along without any problem😄_\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.6/7.6 MB\u001b[0m \u001b[31m65.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m284.6/284.6 kB\u001b[0m \u001b[31m21.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m41.0/41.0 kB\u001b[0m \u001b[31m2.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.4/40.4 kB\u001b[0m \u001b[31m2.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m309.7/309.7 kB\u001b[0m \u001b[31m23.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m55.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.9/50.9 kB\u001b[0m \u001b[31m3.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m129.3/129.3 kB\u001b[0m \u001b[31m9.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "ipython 7.34.0 requires jedi>=0.16, which is not installed.\u001b[0m\u001b[31m\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "! pip install -q llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Version: 0.12.50\n"
     ]
    }
   ],
   "source": [
    "! pip show llama-index | grep \"Version\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Create a Prompt Template\n",
    "\n",
    "In order to use the structured input, we need to create a prompt template that would have a [Jinja](https://jinja.palletsprojects.com/en/stable/) expression (recognizable by the `{{}}`) with a specific filter (`to_xml`) that will turn inputs such as Pydantic `BaseModel` subclasses, dictionaries or JSON-like strings into XML representations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.prompts import RichPromptTemplate\n",
    "\n",
    "template_str = \"Please extract from the following XML code the contact details of the user:\\n\\n```xml\\n{{ data | to_xml }}\\n```\\n\\n\"\n",
    "prompt = RichPromptTemplate(template_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now try to format the input as a string, using different objects as `data`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Please extract from the following XML code the contact details of the user:\n",
       "\n",
       "```xml\n",
       "<user>\n",
       "\t<name>John</name>\n",
       "\t<surname>Doe</surname>\n",
       "\t<age>30</age>\n",
       "\t<email>john.doe@example.com</email>\n",
       "\t<phone>123-456-7890</phone>\n",
       "\t<social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts>\n",
       "</user>\n",
       "\n",
       "```\n"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Using a BaseModel\n",
    "\n",
    "from pydantic import BaseModel\n",
    "from typing import Dict\n",
    "from IPython.display import Markdown, display\n",
    "\n",
    "\n",
    "class User(BaseModel):\n",
    "    name: str\n",
    "    surname: str\n",
    "    age: int\n",
    "    email: str\n",
    "    phone: str\n",
    "    social_accounts: Dict[str, str]\n",
    "\n",
    "\n",
    "user = User(\n",
    "    name=\"John\",\n",
    "    surname=\"Doe\",\n",
    "    age=30,\n",
    "    email=\"john.doe@example.com\",\n",
    "    phone=\"123-456-7890\",\n",
    "    social_accounts={\"bluesky\": \"john.doe\", \"instagram\": \"johndoe1234\"},\n",
    ")\n",
    "\n",
    "display(Markdown(prompt.format(data=user)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Please extract from the following XML code the contact details of the user:\n",
       "\n",
       "```xml\n",
       "<input>\n",
       "\t<name>John</name>\n",
       "\t<surname>Doe</surname>\n",
       "\t<age>30</age>\n",
       "\t<email>john.doe@example.com</email>\n",
       "\t<phone>123-456-7890</phone>\n",
       "\t<social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts>\n",
       "</input>\n",
       "\n",
       "```\n"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# with a dictionary\n",
    "\n",
    "user_dict = {\n",
    "    \"name\": \"John\",\n",
    "    \"surname\": \"Doe\",\n",
    "    \"age\": 30,\n",
    "    \"email\": \"john.doe@example.com\",\n",
    "    \"phone\": \"123-456-7890\",\n",
    "    \"social_accounts\": {\"bluesky\": \"john.doe\", \"instagram\": \"johndoe1234\"},\n",
    "}\n",
    "\n",
    "display(Markdown(prompt.format(data=user_dict)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Please extract from the following XML code the contact details of the user:\n",
       "\n",
       "```xml\n",
       "<input>\n",
       "\t<name>John</name>\n",
       "\t<surname>Doe</surname>\n",
       "\t<age>30</age>\n",
       "\t<email>john.doe@example.com</email>\n",
       "\t<phone>123-456-7890</phone>\n",
       "\t<social_accounts>{'bluesky': 'john.doe', 'instagram': 'johndoe1234'}</social_accounts>\n",
       "</input>\n",
       "\n",
       "```\n"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Using a JSON-like string\n",
    "\n",
    "user_str = '{\"name\":\"John\",\"surname\":\"Doe\",\"age\":30,\"email\":\"john.doe@example.com\",\"phone\":\"123-456-7890\",\"social_accounts\":{\"bluesky\":\"john.doe\",\"instagram\":\"johndoe1234\"}}'\n",
    "\n",
    "display(Markdown(prompt.format(data=user_str)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Chat With an LLM\n",
    "\n",
    "Now that we know how to produce structured input, let's employ it to chat with an LLM!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "··········\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-4.1-mini\")\n",
    "\n",
    "response = await llm.achat(prompt.format_messages(data=user))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The contact details of the user are:\n",
      "\n",
      "- Email: john.doe@example.com  \n",
      "- Phone: 123-456-7890  \n",
      "- Social Accounts:  \n",
      "  - Bluesky: john.doe  \n",
      "  - Instagram: johndoe1234\n"
     ]
    }
   ],
   "source": [
    "print(response.message.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Use Structured Input and Structured Output\n",
    "\n",
    "Combining structured input and structured output might really help to boost the reliability of the outputs of your LLMs - so let's give it a go!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydantic import Field\n",
    "from typing import Optional\n",
    "\n",
    "\n",
    "class SocialAccounts(BaseModel):\n",
    "    instagram: Optional[str] = Field(default=None)\n",
    "    bluesky: Optional[str] = Field(default=None)\n",
    "    x: Optional[str] = Field(default=None)\n",
    "    mastodon: Optional[str] = Field(default=None)\n",
    "\n",
    "\n",
    "class ContactDetails(BaseModel):\n",
    "    email: str\n",
    "    phone: str\n",
    "    social_accounts: SocialAccounts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sllm = llm.as_structured_llm(ContactDetails)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "structured_response = await sllm.achat(prompt.format_messages(data=user))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "john.doe@example.com\n",
      "123-456-7890\n",
      "johndoe1234\n",
      "john.doe\n"
     ]
    }
   ],
   "source": [
    "print(structured_response.raw.email)\n",
    "print(structured_response.raw.phone)\n",
    "print(structured_response.raw.social_accounts.instagram)\n",
    "print(structured_response.raw.social_accounts.bluesky)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
